机器人从能够根据其材料属性进行对象进行分类或操纵对象而受益。这种能力可通过适当的抓握姿势和力选择来确保对复杂物体进行精细操纵。先前的工作集中在触觉或视觉处理上,以确定掌握时间的材料类型。在这项工作中,我们介绍了一种新型的平行机器人抓地力设计,以及一种从握把手指内收集光谱读数和视觉图像的方法。我们训练非线性支持向量机(SVM),该机器可以通过递归估计将要抓住的物体的材料分类,并且随着从指尖到物体的距离降低的距离,置信度越来越高。为了验证硬件设计和分类方法,我们从16种真实和假水果品种(由聚苯乙烯/塑料组成)中收集样品,从而导致一个包含光谱曲线,场景图像和高分辨率纹理图像的数据集,因为对象被掌握,提起并释放。我们的建模方法证明了在32类决策问题中对对象进行分类时的准确性为96.4%。这比最先进的计算机视觉算法的状态在区分视觉上相似的材料方面提高了29.4%。与先前的工作相反,我们的递归估计模型解释了频谱信号强度的增加,并允许随着抓手接近对象做出决策。我们得出的结论是,光谱法是使机器人不仅能够对握住的对象进行分类,还可以理解其潜在的材料组成。
translated by 谷歌翻译
我们为主题转移学习提供了正则化框架,我们寻求培训编码器和分类器以最大限度地减少分类损失,但受潜在代表和主题标签之间的惩罚测量独立性。我们使用相互信息或分歧引入三个独立性和相应的惩罚术语作为独立性的代理。对于每项惩罚期限,我们提供了几种具体估算算法,使用分析方法以及神经批读功能。我们提供了一个脱离策略,用于将这种不同的正规化算法应用于新数据集,我们称之为“Autotransfer”。我们评估这些个体正规化策略和我们的自动转移方法对EEG,EMG和ECOG数据集的表现,表明这些方法可以改善挑战现实世界数据集的主题转移学习。
translated by 谷歌翻译
肌电图(EMG)数据已被广泛采用作为指导人类机器人协作的直观界面。实时检测人类掌握意图的主要挑战是从手动运动中识别动态EMG。先前的研究主要实施了稳态EMG分类,并在动态情况下具有少量的掌握模式,这些模式不足以产生有关实践中肌肉活动变化的差异化控制。为了更好地检测动态运动,可以将更多的EMG变异性集成到模型中。但是,只有有限的研究集中于这种动态抓紧运动的检测,而对非静态EMG分类的大多数现有评估要么需要监督运动状态的地面真相,要么仅包含有限的运动学变化。在这项研究中,我们提出了一个将动态EMG信号分类为手势的框架,并使用一种无​​监督的方法来检查不同运动阶段的影响,以细分和标记动作转变。我们从大型手势词汇中收集和利用了具有多种动态动作的大型手势词汇的数据,以基于掌握动作的常见序列编码从一个抓握意图到另一个掌握的过渡。随后根据动态EMG信号构建了用于识别手势标签的分类器,不需要对运动运动的监督注释。最后,我们使用来自不同运动阶段的EMG数据评估了多种培训策略的性能,并探讨了每个阶段揭示的信息。所有实验均以实时样式进行评估,并随着时间的推移的性能过渡。
translated by 谷歌翻译
目的:对于下臂截肢者,机器人假肢承诺将重新获得日常生活活动的能力。基于生理信号(例如肌电图(EMG))的当前控制方法容易由于运动伪影,肌肉疲劳等导致不良的推理结果。视觉传感器是有关环境状态的主要信息来源,可以在推断可行和预期的手势中发挥至关重要的作用。但是,视觉证据也容易受到其自身的伪像,最常由于对象阻塞,照明变化等。使用生理和视觉传感器测量的多模式证据融合是一种自然方法,这是由于这些模态的互补优势。方法:在本文中,我们提出了一个贝叶斯证据融合框架,用于使用眼部视频,眼睛凝视和来自神经网络模型处理前臂的EMG的掌握意图推理。当手接近对象以掌握对象时,我们将个人和融合性能分析为时间的函数。为此,我们还开发了新颖的数据处理和增强技术来训练神经网络组件。结果:我们的结果表明,相对于EMG和视觉证据,平均而言,融合会提高即将到来的GRASP类型分类准确性,而在触及阶段则提高了13.66%和14.8%的融合,从而单独地和视觉证据,总体融合精度为95.3%。结论:我们的实验数据分析表明,EMG和视觉证据表明互补的强度,因此,多模式证据的融合可以在任何给定时间胜过每个单独的证据方式。
translated by 谷歌翻译
This study focuses on improving the optical character recognition (OCR) data for panels in the COMICS dataset, the largest dataset containing text and images from comic books. To do this, we developed a pipeline for OCR processing and labeling of comic books and created the first text detection and recognition datasets for western comics, called "COMICS Text+: Detection" and "COMICS Text+: Recognition". We evaluated the performance of state-of-the-art text detection and recognition models on these datasets and found significant improvement in word accuracy and normalized edit distance compared to the text in COMICS. We also created a new dataset called "COMICS Text+", which contains the extracted text from the textboxes in the COMICS dataset. Using the improved text data of COMICS Text+ in the comics processing model from resulted in state-of-the-art performance on cloze-style tasks without changing the model architecture. The COMICS Text+ dataset can be a valuable resource for researchers working on tasks including text detection, recognition, and high-level processing of comics, such as narrative understanding, character relations, and story generation. All the data and inference instructions can be accessed in https://github.com/gsoykan/comics_text_plus.
translated by 谷歌翻译
Diffractive optical networks provide rich opportunities for visual computing tasks since the spatial information of a scene can be directly accessed by a diffractive processor without requiring any digital pre-processing steps. Here we present data class-specific transformations all-optically performed between the input and output fields-of-view (FOVs) of a diffractive network. The visual information of the objects is encoded into the amplitude (A), phase (P), or intensity (I) of the optical field at the input, which is all-optically processed by a data class-specific diffractive network. At the output, an image sensor-array directly measures the transformed patterns, all-optically encrypted using the transformation matrices pre-assigned to different data classes, i.e., a separate matrix for each data class. The original input images can be recovered by applying the correct decryption key (the inverse transformation) corresponding to the matching data class, while applying any other key will lead to loss of information. The class-specificity of these all-optical diffractive transformations creates opportunities where different keys can be distributed to different users; each user can only decode the acquired images of only one data class, serving multiple users in an all-optically encrypted manner. We numerically demonstrated all-optical class-specific transformations covering A-->A, I-->I, and P-->I transformations using various image datasets. We also experimentally validated the feasibility of this framework by fabricating a class-specific I-->I transformation diffractive network using two-photon polymerization and successfully tested it at 1550 nm wavelength. Data class-specific all-optical transformations provide a fast and energy-efficient method for image and data encryption, enhancing data security and privacy.
translated by 谷歌翻译
Recent advances in distributed artificial intelligence (AI) have led to tremendous breakthroughs in various communication services, from fault-tolerant factory automation to smart cities. When distributed learning is run over a set of wirelessly connected devices, random channel fluctuations and the incumbent services running on the same network impact the performance of both distributed learning and the coexisting service. In this paper, we investigate a mixed service scenario where distributed AI workflow and ultra-reliable low latency communication (URLLC) services run concurrently over a network. Consequently, we propose a risk sensitivity-based formulation for device selection to minimize the AI training delays during its convergence period while ensuring that the operational requirements of the URLLC service are met. To address this challenging coexistence problem, we transform it into a deep reinforcement learning problem and address it via a framework based on soft actor-critic algorithm. We evaluate our solution with a realistic and 3GPP-compliant simulator for factory automation use cases. Our simulation results confirm that our solution can significantly decrease the training delay of the distributed AI service while keeping the URLLC availability above its required threshold and close to the scenario where URLLC solely consumes all network resources.
translated by 谷歌翻译
Multispectral imaging has been used for numerous applications in e.g., environmental monitoring, aerospace, defense, and biomedicine. Here, we present a diffractive optical network-based multispectral imaging system trained using deep learning to create a virtual spectral filter array at the output image field-of-view. This diffractive multispectral imager performs spatially-coherent imaging over a large spectrum, and at the same time, routes a pre-determined set of spectral channels onto an array of pixels at the output plane, converting a monochrome focal plane array or image sensor into a multispectral imaging device without any spectral filters or image recovery algorithms. Furthermore, the spectral responsivity of this diffractive multispectral imager is not sensitive to input polarization states. Through numerical simulations, we present different diffractive network designs that achieve snapshot multispectral imaging with 4, 9 and 16 unique spectral bands within the visible spectrum, based on passive spatially-structured diffractive surfaces, with a compact design that axially spans ~72 times the mean wavelength of the spectral band of interest. Moreover, we experimentally demonstrate a diffractive multispectral imager based on a 3D-printed diffractive network that creates at its output image plane a spatially-repeating virtual spectral filter array with 2x2=4 unique bands at terahertz spectrum. Due to their compact form factor and computation-free, power-efficient and polarization-insensitive forward operation, diffractive multispectral imagers can be transformative for various imaging and sensing applications and be used at different parts of the electromagnetic spectrum where high-density and wide-area multispectral pixel arrays are not widely available.
translated by 谷歌翻译
Privacy-preserving inference via edge or encrypted computing paradigms encourages users of machine learning services to confidentially run a model on their personal data for a target task and only share the model's outputs with the service provider; e.g., to activate further services. Nevertheless, despite all confidentiality efforts, we show that a ''vicious'' service provider can approximately reconstruct its users' personal data by observing only the model's outputs, while keeping the target utility of the model very close to that of a ''honest'' service provider. We show the possibility of jointly training a target model (to be run at users' side) and an attack model for data reconstruction (to be secretly used at server's side). We introduce the ''reconstruction risk'': a new measure for assessing the quality of reconstructed data that better captures the privacy risk of such attacks. Experimental results on 6 benchmark datasets show that for low-complexity data types, or for tasks with larger number of classes, a user's personal data can be approximately reconstructed from the outputs of a single target inference task. We propose a potential defense mechanism that helps to distinguish vicious vs. honest classifiers at inference time. We conclude this paper by discussing current challenges and open directions for future studies. We open-source our code and results, as a benchmark for future work.
translated by 谷歌翻译
Federated learning (FL) is a promising approach to enable the future Internet of vehicles consisting of intelligent connected vehicles (ICVs) with powerful sensing, computing and communication capabilities. We consider a base station (BS) coordinating nearby ICVs to train a neural network in a collaborative yet distributed manner, in order to limit data traffic and privacy leakage. However, due to the mobility of vehicles, the connections between the BS and ICVs are short-lived, which affects the resource utilization of ICVs, and thus, the convergence speed of the training process. In this paper, we propose an accelerated FL-ICV framework, by optimizing the duration of each training round and the number of local iterations, for better convergence performance of FL. We propose a mobility-aware optimization algorithm called MOB-FL, which aims at maximizing the resource utilization of ICVs under short-lived wireless connections, so as to increase the convergence speed. Simulation results based on the beam selection and the trajectory prediction tasks verify the effectiveness of the proposed solution.
translated by 谷歌翻译